COMPUTER VISION AND PATTERN RECOGNITION

[554SM]
a.a. 2024/2025

First semester

Frequency Not mandatory

  • 6 CFU
  • 48 hours
  • English
  • Trieste
  • Opzionale
  • Standard teaching
  • Oral Exam
  • SSD ING-INF/04
Curricula: COMPUTATIONAL MODELING AND DIGITAL TWINS
Syllabus

KNOWLEDGE AND UNDERSTANDING
The student, at the end of the course, should know the principles of
computer vision and pattern recognition, with particular reference to:
- image formation;
- feature detection and description;
- stereopsis;
- object detection in images;
- application of machine learning techniques to the above topics.
APPLYING KNOWLEDGE AND UNDERSTANDING
The student should be able to implement, at prototype level, basic algorithms of computer vision and pattern recognition.
MAKING JUDGEMENTS
The student should be able to identify the most appropriate techniques,
among those learned, to address a problem of computer vision and/or
pattern recognition.
COMMUNICATION SKILLS
The student should be able to describe in a clear and plain way the
functioning of a computer vision algorithm with the correct use of
technical vocabulary.
LEARNING SKILLS
The student should be able to read and understand reference textbooks
on computer vision and pattern recognition.

CALCULUS:
• derivatives, directional derivatives and gradients;
• integrals;
• exponentials.
LINEAR ALGEBRA AND GEOMETRY:
• linear systems of equations;
• least squares solution to over(under)-determined linear systems;
• eigenvalue decomposition;
• singular value decomposition;
• positive definite matrices;
• rigid transformations.
A linear algebra review and a rigid transformation review are made available as supplementary material.
MACHINE LEARNING:
• basic concepts;
• linear classifiers;
• principal component analysis.
OPTIMIZATION AND OPERATIONS RESEARCH:
• Lagrange multipliers;
• linear and quadratic programming.

Introduction to Computer Vision
Image formation
Image processing
Feature detection
Fitting geometric primitives
Support vector machines
Recognition
Deep Learning for Computer Vision
Camera calibration
Stereopsis

Antonio Torralba, Phillip Isola and William T. Freeman. Foundations of Computer Vision.
MIT Press, 2024.
Szeliski, Richard. Computer vision: algorithms and applications, 2nd edition. Springer Cham, 2022.
Forsyth, David A., and Jean Ponce. Computer vision: a modern approach,
2nd edition. Pearson, 2012.
Fusiello, Andrea. Visione computazionale. Tecniche di ricostruzione tridimensionale. FrancoAngeli, 2018
Klette, Reinhard. Concise computer vision. Springer, 2014.


Introduction to computer vision. Image formation. The pinhole camera.
Image processing. Linear filtering. Convolution and correlation.
Non-linear filters. Neighborhood operators. Image warping. Multi-resolution representations. Feature detection and
matching. Key-point detection. Harris detector. Hessian detector. Scalespace representation. Laplacian detector. SIFT detector. Descriptors.
SIFT, MOPS, PCA-SIFT, GLOH descriptors. Feature matching. Edge
detection. Boundary detection. Signatures. Earth mover’s distance.
Fitting. Hough transform. Generalized Hough transform. Robust
estimators. Random sample consensus. Camera model. Camera calibration. Direct linear transform. Zhang’s
method.
Stereopsis. Triangulation. Epipolar geometry. Fundamental matrix.
Epipolar rectification. Relative pose. Essential matrix. Eight-point
algorithm. Support vector machines. Binary classification. Primal
and dual formulation. Soft margin. Multi-class classification. Incorporating
a-priori knowledge. Object recognition. Window-based detection. Viola-Jones detector. Histograms of oriented gradients detector. Instance
recognition. Eigenfaces. Fisherfaces. Instance recognition from local
features. Image retrieval based on visual dictionary. Category recognition
using bag-of-words. Spatial pyramid kernel.
Category recognition using convolutional neural networks (CNNs).
Generalities on CNNs. Architecture, training. Dropout. Batch normalization. Case studies: VGG, ResNet, YOLO. Transfer learning. Specialized neural architectures for object detection and other advanced tasks.

Lectures (80%) and hands-on laboratory (20%).
The lectures are based on teaching material provided by the lecturer on the Moodle Platform (https://moodle2.units.it/course/search.php?search=554sm). The slides will be made available before each lecture takes place. The slides have been prepared with the goal of being self-contained and no further material is required (i.e., further suggested readings and reference textbooks are not mandatory). The lab lectures are interactive lectures in which the students solve (with the help of the instructor, and using their own laptop and a programming language such as Python or Matlab) some CV problems such as estimating homographies, detecting and describing keypoints, and classifying images using convolutional neural networks. The starter code, images and datasets for the lab lectures are made available on Moodle as well. The purpose of the lab lectures is improving the understanding of the concepts, rather than learning the tools for developing CV applications.

*Frequently Asked Questions*

Q: What is the course about?
A: The course covers topics in Computer Vision (CV) focusing on both the geometrical (3D vision) and detection/recognition aspects. The name of the course reflects such a dual objective.

Q: Is it a course about "cool new stuff" and very recent research results?
A: No. The course should be considered an introductory one. The aim of the course is to provide solid theoretical and methodological background upon which one can build further competences.

Q: By only attending the course, will I become a "CV developer"?
A: No. Being able to develop CV applications requires (in addition to what you will learn during the course) programming skills and the knowledge of specific software tools.

Q: Is it a course on machine learning (ML)?
A: No, although we will largely employ ML concepts and tools, and will study to a certain detail the Support Vector Machines and the Convolutional Neural Networks.

Q: Is it a course on Deep Learning (DL)?
A: No, although we will recall some DL concepts for self-containedness. We will also employ DL and study some specific network architectures employed in CV.

Q: Is it a course on optics/image formation?
A: No, but some basic concepts of optics are necessary to understand how the images are formed, and how they can be analyzed to infer properties of the 3D objects they represent.

Q: Why do the slides refer to scientific literature that dates back to the 1980s, (and even earlier)?
A: Because having a historical perspective is important. Novel approaches, in CV, are often the result of combining some old (but good) ideas and concepts with new ideas or new technologies.

Q: Why does the course waste time with "classical" CV tools (meaning tools of the pre-DL era)?
A: Studying classical CV is by no means a waste of time, because:
-some problems (in particular, inferring with high precision geometric properties of objects) have efficient and effective non-DL solutions;
-many very recent DL architectures are actually inspired by the classical methods.

### Final Exam Overview

The final exam comprises three components:

1. **Project**
2. **Written Examination**
3. **Oral Examination**

#### Grading Breakdown
- **Project:** 0 to 6
- **Written Examination:** 0 to 18
- **Oral Examination:** 0 to 6

The total score is the sum of the three components. To pass the exam, students must achieve a minimum total score of 18, with at least:
- 3 in the Project
- 9 in the Written Examination
- 3 in the Oral Examination

Both the written and oral examinations will be conducted during the same exam session.

#### Project Details
The project involves designing, implementing, and presenting a solution to a computer vision or pattern recognition problem selected from a list provided by the lecturer. Projects may be completed individually or in groups and must be submitted before the exam date.

#### Written Examination Details
The written examination lasts 60 minutes and includes:
- 15 multiple-choice questions
- 1 essay question

#### Oral Examination Details
The oral examination typically lasts 20 to 30 minutes. It assesses the student's communication skills, knowledge, and understanding of the course material. The oral exam may also include a brief discussion about the project.

In any type of content produced by the student for admission to or participation in an exam (projects, reports, exercises, tests), the use of Large Language Model tools (such as ChatGPT and the like) must be explicitly declared. This requirement must be met even in the case of partial use.
Regardless of the method of assessment, the teacher reserves the right to further investigate the student's actual contribution with an oral exam for any type of content produced.

This course explores topics closely related to one or more goals of the United Nations 2030 Agenda for Sustainable Development (SDGs)

icona 9